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Abstract —The current onion routing implementation of Tribler works 
as expected but throttles the overall throughput of the Tribler system. 
This article discusses a measuring procedure to reproducibly profile the 
tunnel implementation so further optimizations of the tunnel community 
can be made. Our work has been integrated into the Tribler eco-system. 


Introduction 

Any sense of privacy on The Internet is an illusion. 
Even Tor, which could be considered the most privacy 
preserving networking framework in existence, has its 
flaws [3]. Furthermore, privacy-enhancing technology is 
difficult to use for normal users and it slows down The 
Internet. For instance, should one first setup a VPN and 
then Tor or Tor and then VPN? Getting it wrong might 
actually negatively impact a user's privacy^. 

With yearly sales of smartphones and smart watches 
approaching one billion, threats to user privacy are be¬ 
coming a global phenomenon. People can be traced to 
a location within (in the worst case) 20 meters in real- 
time^^ and recognized [1]. With these billions of people 
facing threats ranging from targeted advertisement to 
burglary or even harassment, the need for scalable and 
light-weight privacy-enhancing technology becomes ap¬ 
parent. However, no optimized implementation of a scal¬ 
able architecture exists in this emerging research field. 
We provide a key step forward by identyfing bottlenecks 
in Tribler. 

The Tor project^ aims to offer anonymity by forward¬ 
ing traffic through a series of relays. Multiple layers of 
encr5rption are utilized such that no single relay can 
reconstruct the entire circuit. This is also called onion 
routing. These relays are provided by volunteers which 
means there is often not enough bandwidth available 
causing the Tor network to be slow; almost no-one uses 
it for everyday browsing. 

The solution is to make everyone in the network a 
relay for others [2]. Several implementations of such an 

1. https://trac.torproject.org/projects/tor/wiki/doc/TorPlusVPN 

2. http://buddy-locator.com/ 

3. http://www.mobile-scan.com/ 

4. https://www.torproject.org/ 



Fig. 1. Download speed per amount of hops 

approach are available, for instance, Tribler® and Hola®. 
llieir architecture allows them to scale to both a large 
number of users and high-bandwidth applications such 
as HD video streaming. 

The key contribution of this article is a performance 
analysis of the first implementation of Tor-derived 
onion routing implementation with user-donations, 
NAT/firewall puncturing and fully decentralized peer 
discovery: Tribler. To make our test realistic we procured 
various anonymous private servers in exotic locations 
such as Belize and Noord-Holland. 

Problem description 

The current onion routing implementation can find a 
2 hop path between the piece of information needed 
and the current computer running Tribler. It also allows 
building circuits of variable length, but this is currently 
not used as it severely limits the throughput of down¬ 
loads. This was demonstrated by an experiment in 2014 
among Tribler users. In this experiment dozens of users 
downloaded a 50 MB file using the turmel code from 
a 1 Gbps server^. The results of this experiment can be 

5. https://tribler.org/ 

6. https://hola.org/ 

7. http://forum.tribler.org/viewtopic.php?f=2(§i:t=2121 
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Fig. 2. Experiment setup 

observed in Figure 1. However, it was unknown what 
the limiting factors were. Possible culprits included the 
openssl library calls, the single core processing, the lO 
capacity or the network bandwidth. 

Experiment setup 

The contents of fhis section are as follows. Firsf if will 
explain fhe high-level consfrucfion of fhe node network 
in fhe experimenf. Secondly fhe frameworks fhaf were 
used are infroduced. Lasfly fhe limifafions and pracfical 
defails of our sefup are discussed. 

In fhis experimenf a small communify of nodes is 
creafed. An example of one such circuif in fhis network 
is shown in Figure 2 and consists of a: 

• Seeding node, sending dafa 

• Exif node, relaying fhe unencrypfed dafa fo fhe sink 

• Sink node, receiving all of fhe dafa from fhe Seed¬ 
ing node 

• Relay node, relaying encrypfed packefs wifhin a 
circuif 

An experimenf run consisfs of fhe seeding node con- 
strucfing 4 differenf circuifs using 0 fo 3 hops (relay 
nodes + 1 exif node), fo send fo fhe sink node. Each 
of fhese circuifs are builf and desfroyed independenfly 
of each ofher. Affer building fhe circuifs, some random 
dafa is senf between the first and last node in the 
circuit. The circuit with 0 hops just encrypts and decrypts 
information on the same node. This allows us to run the 
experiment without network overhead. At the end of the 
run the results are evaluated per node type. 

Dispersy [4] allows for decentralized communities 
of nodes to communicate using custom protocols. The 
turmel community in Tribler is one of these Dispery 
communities. Through the implementation of the turmel 
community, nodes can armounce, discover and share 
candidate exit nodes and relays. In the experiment of 
this paper a single instance of the Tribler turmel com¬ 
munity is created without the support of other Tribler 
functionalities. 


Gumby is the experiment rurmer for the Tribler project. 
It allows for creating repeatable tests that can spawn 
instances on different computer setups, for example run¬ 
ning all Tribler instances on different DAS4 computers 
or rurming it all locally. To provide this functionality 
it uses high level scenarios which interact with python 
boilerplate code to interact with core Tribler functional¬ 
ity. This paper has expanded upon Gumby by creating 
a scenario for testing the turmel implementation with 
various numbers of hops and circuits and providing 
the python boilerplate code for rurming this scenario 
using either random packet transmission or Lib Torrent 
controlled packet transmission over the turmeTs circuits. 

Yappi® is the profiler used to obtain the profiling 
information from the application while the application is 
rurming. Yappi is designed to support multi-threaded ap¬ 
plications and be started and stopped without affecting 
the application, which makes it the suited debugger for 
the Tribler components. Depending on the configuration, 
it returns the amount times a function was called and 
either the wall time or cpu time per function. 

To understand the timing of the turmel component 
we utilized Yappi's GPU time reporting and filtered 
the results to only include function calls related to the 
turmel implementation. These results per function are 
then sorted and plotted relatively in a pie chart and 
plotted absolutely in a bar graph using R scripts. 

Due to budget and time constraints the experiment has 
only been run on a single machine, rurming 8 instances of 
Tribler. This approach places extra strain on the system 
resources, but this should be of no consequence when 
interpreting the relative results of function performance. 
This also means that packets do not travel over the wild 
internet in the experiment (instead they bounce back 
from the on-site router). However, delays imposed due 
to packets having a longer transmission time should not 
matter for the performance of the implementation. In 
fact, if these transmission delays impact the time spent 
in functions, one could argue that this does not measure 
the core performance of the function and thus gives a 
skewed result of its performance. 

Experimental results 

The initial observervation to be made when interpreting 
the results is the fact that the seeding node is influenced 
the most by the increasing number of hops. As the num¬ 
ber of hops increased per run, the cr 5 q)tographic com¬ 
ponent of the seeding node {encrypt_str and crypto_out) 
started to take a greater toll on its performance (see 
Eigure 3 and Eigure 4). This is in contrast to the relay 
and exit nodes which appeared to not differ at all (per¬ 
formance wise) as the number of hops increased. This 
result is in line with the onion routing model, as the 

8. https://code.google.eom/p/yappi/ 
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Fig. 3. Relative CPU time per function of the seeding 
node for 0 hops and 4 circuits 



Fig. 4. Relative CPU time per function of the seeding 
node for 3 hops and 4 circuits 


seeding node is in charge of encr 5 rpting a packet multiple 
times, for each hop in a circuit. The exit and relay nodes 
only need to decr 5 rpt this packet once and forward it to 
another node. 

To better explain and distinguish the different results 
we shall categorize the functions into two seperate func¬ 
tionally similar groups. This is mainly done to provide 
abstraction over the circuit implementation, where func¬ 
tions might perform the same functionality but in a 
different setting (like sendjpacket for the seeding node 
and relay_packet for the exit/relay nodes). The first group 
we distinguish is the crypto set. This group contains 
the functions that pertain to encr 5 rption or decryption 
of messages. The functions contained in this first cat¬ 
egory are the encrypt_str, decrypt_str, encodejiddress, de- 
code_address and cryptojout functions. The second group 
is the networking set. This group contains functions that 
handle creating and sending the actual UDP packages 
over the internet. The functions of this category are the 
send_packet and relayjpacket functions. All of the other 
functions we assign to the other set. 



0 Hops 

3 Hops 

Exit 

encrypt sfr 

0.18 

0.28 

0.00 

decrypt_str 

0.00 

0.00 

0.15 

encode address 

0.06 

0.04 

0.00 

decode address 

0.08 

0.05 

0.05 

cryptojDiit 

0.12 

0.15 

0.00 


0.44 

0.53 

0.20 


TABLE 1 

The relative processing times of the crypto set of 
functions 



0 Hops 

3 Hops 

Exit 

send_packet 

0.13 

0.10 

0.08 

relay packet 

0.00 

0.00 

0.11 


0.13 

0.10 

0.19 


TABLE 2 

The relative processing times of the network set of 
functions 


What can be observed when analyzing this data is that 
the crypto set's functions take up the most time. The sec- 
ondmost impactful functions are those of the networking 
set. As seen in Table 1 and Table 2 these two sets of 
functions take more than 50% of the CPU time for tunnel 
community code in the seeding experiment. In the exit 
node they use almost 40% of the CPU time. From this we 
can conclude that the tunnel components which would 
benefit most of parallellization are these two classes of 
functions. Also we note that this is not an easy task since 
these two classes are mostly reliant on external libraries 
in Tribler's implementation. Parallelizing functions in 
the other set might not be a good idea though as the 
parallelization overhead might outweigh the execution 
times of the functions. This is of course dependent on the 
machine Tribler is being executed on, but if one considers 
smartphones as the targeted platform for optimization, 
this would indeed be a bad idea. 

From Figure 3 and Figure 4 it can be observed that the 
encode_address and decode_address take a lot of CPU time. 
After manual inspection it has been determined that the 
encode_address function is concerned with converting IP 
strings to a binary format. Because of their repeated use, 
these results can be cached to save time. One way to do 
this, would be to save the converted IP strings in their 
binary format. When decode_address is observed, we see 
that this method could also be used for reverse lookups 
of host and port tuples. Better yet, for both functions, 
would be to only work with the binary encoded ad¬ 
dresses in the entirety of the turmel community and only 
convert them to string format when it is really needed 
(like user interactions). 

Lastly we can look to optimizations that can be done to 
speed up the onion routing encr 5 rption, which responds 
the worst to scaling up the amount of hops. This is not 
something that can be fixed by using normal simple 
parallelization however, but due to the fact that onion 
routing requires successive RSA encryptions on the same 
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packet. The only way to speed this up, is to perform 
pipelined header encryption. However, this would re¬ 
quire a great amount of confrol over Pyfhon's packef 
compression and encrypfion library. Even fhough fhis 
would aid performance, mosf of a normal PC's resources 
will already be in use fhanks fo fhe parallelism of fhe 
differenf circuifs being used. 

On a more course grained scope of parallellizafion we 
have found, affer inspecfing fhe source code of Tribler, 
fhaf sending or relaying occurs sequenfially affer fhe 
encr5q3fion and decr 5 q)fion of a packef. If fhis process 
was pipelined perfecfly, fhe resulfs show a speedup fo 
15.46% for fhe 0 hop and 10.99% for fhe 3 hop seeding 
node experimenf and a speedup of 12.5% for fhe exif 
node experimenfs. This pipelining could be achieved by 
usage of fransmission buffers or a fhread-safe double 
ended queue, which store encr 5 rpted packefs awaifing 
fo be senf by anofher fhread. In confrasf fo fhe per- 
funcfion parallellizafion fhis would be relafively frivial fo 
implemenf, buf a significanf improvemenf nonefheless. 

One quirk in fhe resulfs is fhe fofal amounf of time 
spenf in fhe functions of fhe seeding node. The exper¬ 
imenfs showed fhaf as fhe number of hops increased, 
fhe fofal time spenf in fhe functions of fhe seeding node 
decreased. If is speculated fhaf fhis is due fo some form 
of load balancing among fhe nodes. This would be due 
fo fhe node processes being allocated fo differenf cores 
on fhe simulafing machine. The exacf remains a mysfery 
and a fargef for fufure work. 

The exacf fofal runtimes for fhe fop 20 funcfions in 
fhe seeding node and exif node experimenfs can also be 
found in Figure 5 and Figure 6 respecfively. 


Conclusion 

In fhis paper we have evaluafed fhe performance of fhe 
implemenfafion of a privacy preserving communication 
protocol in a mafuring application called Tribler. We have 
successfully lapped info and analyzed function calls in 
fhis implemenfafion, bofh manually and using measur¬ 
ing frameworks. Our resulfs have uncovered fwo sefs of 
funcfions where fhere is major room for improvemenf by 
way of differenf forms of parallelizafion. We have found 
fhaf fhese sefs consisfs of funcfions concerned wifh (1) 
sending packefs and (2) encr 5 rpting packefs. Furfhemore 
we found fhaf fhese fwo cafegories of bottleneck func¬ 
fions are of more or less equal size, wifh fhe encr 5 q)fion 
of packefs faking only slighfly longer fhan fhe sending 
of packefs. Our work has been made a sfandard parf 
of the Tribler ecosystem and will be used for further 
optimizations like multi-core support. 
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Future work 

The next step in this research would be to utilize Lib Tor¬ 
rent to send packets over the circuits. This would enable 
our experiment to utilize rate limiting and measure 
packet loss overhead. At the moment our implementa¬ 
tion does not consider packet loss, in other words it is 
sending UDP packets blindly. This means packets that 
are received by the sink node are being dropped instantly 
and common occurrences such as packet retransmissions 
are not measured. One thing to keep in mind, is that 
using Lib Torrent should not offer any new insights into 
throughput loss. This is because Lib Torrent is concerned 
with the contents of the data that is being sent over the 
circuits and not how the turmel implementation sends it 
over the network. The measurements performed by the 
experiment in this paper are only on the actual turmel 
implementation and not on the functions that handle 
the received data. Thus, whereas this might change the 
absolute time spent in different functions, the relative 
time spent should remain the same. 
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Fig. 5. Absolute time spent for different (#hops, #circuits) experiments for the seeding node 
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Fig. 6. Absolute time spent for different (#hops, #circuits) experiments for the exit node 
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